33 research outputs found

    Cue Integration in Categorical Tasks: Insights from Audio-Visual Speech Perception

    Get PDF
    Previous cue integration studies have examined continuous perceptual dimensions (e.g., size) and have shown that human cue integration is well described by a normative model in which cues are weighted in proportion to their sensory reliability, as estimated from single-cue performance. However, this normative model may not be applicable to categorical perceptual dimensions (e.g., phonemes). In tasks defined over categorical perceptual dimensions, optimal cue weights should depend not only on the sensory variance affecting the perception of each cue but also on the environmental variance inherent in each task-relevant category. Here, we present a computational and experimental investigation of cue integration in a categorical audio-visual (articulatory) speech perception task. Our results show that human performance during audio-visual phonemic labeling is qualitatively consistent with the behavior of a Bayes-optimal observer. Specifically, we show that the participants in our task are sensitive, on a trial-by-trial basis, to the sensory uncertainty associated with the auditory and visual cues, during phonemic categorization. In addition, we show that while sensory uncertainty is a significant factor in determining cue weights, it is not the only one and participants' performance is consistent with an optimal model in which environmental, within category variability also plays a role in determining cue weights. Furthermore, we show that in our task, the sensory variability affecting the visual modality during cue-combination is not well estimated from single-cue performance, but can be estimated from multi-cue performance. The findings and computational principles described here represent a principled first step towards characterizing the mechanisms underlying human cue integration in categorical tasks

    The time course of auditory and language-specific mechanisms in compensation for sibilant assimilation

    Get PDF
    Models of spoken-word recognition differ on whether compensation for assimilation is language-specific or depends on general auditory processing. English and French participants were taught words that began or ended with the sibilants /s/ and /∫/. Both languages exhibit some assimilation in sibilant sequences (e.g., /s/ becomes like [∫] in dress shop and classe chargée), but they differ in the strength and predominance of anticipatory versus carryover assimilation. After training, participants were presented with novel words embedded in sentences, some of which contained an assimilatory context either preceding or following. A continuum of target sounds ranging from [s] to [∫] was spliced into the novel words, representing a range of possible assimilation strengths. Listeners' perceptions were examined using a visual-world eyetracking paradigm in which the listener clicked on pictures matching the novel words. We found two distinct language-general context effects: a contrastive effect when the assimilating context preceded the target, and flattening of the sibilant categorization function (increased ambiguity) when the assimilating context followed. Furthermore, we found that English but not French listeners were able to resolve the ambiguity created by the following assimilatory context, consistent with their greater experience with assimilation in this context. The combination of these mechanisms allows listeners to deal flexibly with variability in speech forms

    Plasticity of categories in speech perception and production

    Get PDF
    While perceptual categories exhibit plasticity following recently heard speech, evidence of effects on production has been mixed. We tested the influences of perceptual plasticity on production with an implicit distributional learning paradigm. In Experiment 1, we exposed participants to an unlabelled bimodal distribution of voice onset time (VOT) using bilabial stop consonants, with a longer category boundary than is typical. Participants’ perceptual category boundaries shifted towards longer VOT, with a congruent increase in production VOT. Experiment 2 found evidence of perceptual transfer of these shifts to a different speaker and different syllables, and different words in production. Experiment 3 showed no shifts following exposure to a VOT boundary shorter than typical. We conclude that when listeners adjust their perceptual category boundaries, these changes may affect production categories, consistent with models where speech perception and production categories are linked, but with category boundaries influencing the link between perception and production

    Does high variability training improve the learning of non-native phoneme contrasts over low variability training? A replication

    No full text
    Acquiring non-native speech contrasts can be difficult. A seminal study by Logan, Lively and Pisoni (1991) established the effectiveness of phonetic training for improving non-native speech perception: Japanese learners of English were trained to perceive /r/-/l/ using minimal pairs over 15 training sessions. A pre/post-test design established learning and generalisation. In a follow-up study, Lively, Logan and Pisoni (1993) presented further evidence which suggested that talker variability in training stimuli was crucial in leading to greater generalisation. These findings have been very influential and “high variability phonetic training” is now a standard methodology in the field. However, while the general benefit of phonetic training is well replicated, the evidence for an advantage of high over lower variability training remains mixed. In a large-scale replication of the original studies using updated statistical analyses we test whether learners generalise more after phonetic training using multiple talkers over a single talker. We find that listeners learn in both multiple and single talker conditions. However, in training, we find no difference in how well listeners learn for high vs low variability training. When comparing generalisation to novel talkers after training in relation to pre-training accuracy, we find ambiguous evidence for a high-variability benefit over low-variability training: This means that if a high-variability benefit exists, the effect is much smaller than originally thought, such that it cannot be detected in our sample of 166 listeners
    corecore